The Robustness of Domain Lexico-Taxonomy: Expanding Domain Lexicon with CiLin
نویسندگان
چکیده
This paper deals with the robust expansion of Domain LexicoTaxonomy (DLT). DLT is a domain taxonomy enriched with domain lexica. DLT was proposed as an infrastructure for crossing domain barriers (Huang et al. 2004). The DLT proposal is based on the observation that domain lexica contain entries that are also part of a general lexicon. Hence, when entries of a general lexicon are marked with their associated domain attributes, this information can have two important applications. First, the DLT will serve as seeds for domain lexica. Second, the DLT offers the most reliable evidence for deciding the domain of a new text since these lexical clues belong to the general lexicon and do occur reliably in all texts. Hence general lexicon lemmas are extracted to populate domain lexica, which are situated in domain taxonomy. Based on this previous work, we show in this paper that the original DLT can be further expanded when a new language resource is introduced. We applied CiLin, a Chinese thesaurus, and added more than 1000 new entries for DLT and show with evaluation that the DLT approach is robust since the size and number of domain lexica increased effectively.
منابع مشابه
Domain Lexico-Taxonomy: An Approach Towards Multi-domain Language Processing
This paper deals with the domain barrier issues in language processing. Our work centers on Domain Lexico-Taxonomy (DLT), a domain taxonomy enhanced by domain lexicons. We propose DLT as an infrastructure for crossing domain barriers. By using DLT with WordNet and Domain Taxonomy, we can get 15160 Chinese lemmas in 463 domains. We estimate the accuracy of five domain’s lemmas, and get 89.74% in...
متن کاملObjects Identification in Object-Oriented Software Development - A Taxonomy and Survey on Techniques
Analysis and design of object oriented is onemodern paradigms for developing a system. In this paradigm, there are several objects and each object plays some specific roles. Identifying objects (and classes) is one of the most important steps in the object-oriented paradigm. This paper makes a literature review over techniques to identify objects and then presents six taxonomies for them. The f...
متن کاملSemantic Atomicity and Multilinguality in the Medical Domain: Design Considerations for the MorphoSaurus Subword Lexicon
We present the lexico-semantic foundations underlying a multilingual lexicon the entries of which are constituted by so-called subwords. These subwords reflect semantic atomicity constraints in the medical domain which diverge from canonical lexicological understanding in NLP. We focus here on criteria to identify and delimit reasonable subword units, to group them into functionally adequate sy...
متن کاملContent Evaluation of Iranian EFL Textbook Vision 1 Based on Bloom’s Revised Taxonomy of Cognitive Domain
Textbooks are considered as the common features of the classrooms and are important means to make contributions to curricula. Therefore, their contents are very essential to develop the adequate curriculum planning. A textbook analysis is a means by which different features of the textbooks can be analyzed and hence their effectiveness is validated. This study set out to evaluate the content of...
متن کاملGenerating a Resource for Products and Brandnames Recognition. Application to the Cosmetic Domain
Named Entity Recognition task needs high-quality and large-scale resources. In this paper, we present RENCO, a based-rules system focused on the recognition of entities in the Cosmetic domain (brandnames, product names, ...). RENCO has two main objectives: 1) Generating resources for named entity recognition; 2) Mining new named entities relying on the previous generated resources. In order to ...
متن کامل